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In the study of networked systems such as biological, technological, and social networks the 
available data are often uncertain. Rather than knowing the structure of a network exactly, we 
know the connections between nodes only with a certain probability. In this paper we develop 
methods for the analysis of such uncertain data, focusing particularly on the problem of community 
detection. We give a principled maximum-likelihood method for inferring community structure and 
demonstrate how the results can be used to make improved estimates of the true structure of the 
network. Using computer-generated benchmark networks we demonstrate that our methods are 
able to reconstruct known communities more accurately than previous approaches based on data 
thresholding. We also give an example application to the detection of communities in a protein- 
protein interaction network. 


I. INTRODUCTION 

Many systems of scientific interest can be usefully rep¬ 
resented as networks and the last few years have seen a 
surge of interest in the study of networks, due in part 
to the fruitful application of a range of techniques drawn 
from physics jT] . Most current techniques for the analysis 
of networks begin with the assumption that the network 
data available to us are reliable, a faithful representation 
of the true structure of the network. But many real-world 
data sets, perhaps most of them, in fact contain errors 
and inaccuracies. Thus, rather than representing a net¬ 
work by a set of nodes joined by binary yes-or-no edges, 
as is commonly done, a more realistic approach would 
be to specify a probability or likelihood of connection 
between every pair of nodes, representing our certainty 
(or uncertainty) about the existence of the corresponding 
edge. If most of the probabilities are close to zero or one 
then the data are reliable—for every node pair we are 
close to being certain that it either is or is not connected 
by an edge. But if a significant fraction of pairs have a 
probability that is neither close to zero nor close to one 
then we are uncertain about the network structure. In re¬ 
cent years an increasing number of network studies have 
started to provide probabilistic estimates of uncertainty 
in this way, particularly in the biological sciences. 

One simple method for dealing with uncertain net¬ 
works is thresholding: we assume that edges exist when¬ 
ever their probability exceeds a certain threshold that we 
choose. In work on protein-protein interaction networks, 
for example, Krogan et al. [2] assembled a sophisticated 
interaction data set that includes explicit estimates of 
the likelihood of interaction between every pair of pro¬ 
teins studied. To analyze their data set, however, they 
then converted it into a conventional binary network by 
thresholding the likelihoods, followed by traditional net¬ 
work analyses. While this technique can certainly reveal 
useful information, it has some drawbacks. First, there 
is the issue of the choice of the threshold level. Kro¬ 


gan et al. used a value of 0.273 for their threshold, but 
there is little doubt that their results would be different if 
they had chosen a different value and little known about 
how to choose the value correctly. Second, thresholding 
throws away potentially useful information. There is a 
substantial difference between an edge with probability 
0.3 and an edge with probability 0.9, but the distinction 
is lost if one applies a threshold at 0.273—both fall above 
the threshold and so are considered to be edges. Third, 
and more subtly, thresholded probability values fail to 
satisfy certain basic mathematical requirements, mean¬ 
ing that thresholded networks are essentially guaranteed 
to be wrong, often by a wide margin. If, for instance, 
we have 100 node pairs connected with probability 0.5 
each, then on average we expect 50 of those pairs to be 
connected by edges. If we place a threshold on the proba¬ 
bility values at, say, 0.273, however, then all 100 of them 
will be converted into edges, a result sufficiently far from 
the expected value of 50 as to have a very low chance of 
being correct. 

In this paper we develop an alternative and princi¬ 
pled approach to the analysis of uncertain network data. 
We focus in particular on the problem of community 
detection in networks, one of the best studied analysis 
tasks. We make use of maximum-likelihood inference 
techniques, whose application to networks with definite 
edges is well developed EHB1- Here we extend those de¬ 
velopments to uncertain networks and show that the re¬ 
sulting analyses give significantly better results in con¬ 
trolled tests than thresholding methods. As a corollary, 
our methods also allow us to estimate which of the uncer¬ 
tain edges in a data set is mostly likely to be a true edge 
and hence reconstruct, in a probabilistic fashion, the true 
structure of the underlying network. 

A number of authors have looked at related ques¬ 
tions in the past. There exists a substantial literature 
on the analysis of weighted networks, meaning networks 
in which the positions of the edges are exactly known 
but the edges carry varying weights, such as strengths, 
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lengths, or volumes of traffic. Such weighted networks 
are somewhat similar to the uncertain networks studied 
in this paper—edges can be either strong or weak in a cer¬ 
tain sense—but at a deeper level they are different. For 
instance, the data sets we consider include probabilities 
of connection for every node pair, whereas weighted net¬ 
works have weights only for node pairs that are known 
to be connected by an edge. More importantly, in our 
uncertain networks we imagine that there is a definite 
underlying network but that it is not observed; all we 
see are noisy measurements of the underlying truth. In 
weighted networks the data are considered to be exact 
and true and the variation of edge weights represents an 
actual physical variation in the properties of connections. 

Methods for analyzing weighted networks include sim¬ 
ple mappings to unweighted networks and generalizations 
of standard methods to the weighted case [7]. Inference 
methods, akin to those we use here, have also been ap¬ 
plied to the weighted case [5] and to the case of affin¬ 
ity matrices, as used for example in computer vision for 
image segmentation |9.]. A little further afield, Harris 
and Srinivasan m have looked at network failures in a 
noisy network model in which edges are deleted with uni¬ 
form probability, while Saade et al. HH use spectral tech¬ 
niques to detect node properties, but not community af¬ 
filiations, when the underlying network is known but the 
node properties depend on noisy edge labels. In related 
work, Xu et al. TS: have studied the prediction of edge 
labels using inference methods and Kurihara et al. [13] 
have applied inference to a case where the data give the 
frequency of interaction between nodes. 


II. METHODS 

We focus on the problem of community detection in 
networks whose structure is uncertain. We suppose that 
we have data which, rather than specifying with certainty 
whether there is an edge between two nodes i and j, gives 
us only a likelihood or probability Qij that there is an 
edge. We will assume that the probabilities are indepen¬ 
dent. Correlated probabilities are certainly possible, but 
the simple case of independent probabilities already gives 
many interesting results, as we will see. 

At the most basic level our goal is to classify the nodes 
of the network into non-overlapping communities— 
groups of nodes with dense connections within groups 
and sparser connections between groups, also known as 
“assortative” structure. More generally we may also be 
interested in disassortative structures in which there are 
more connections between groups than within them, or 
mixed structures in which different groups may be ei¬ 
ther assortative or disassortative within the same net¬ 
work. Conceptually, we assume that even though our 
knowledge of the network is uncertain, there is a definite 
underlying network in which each edge either exists or 
does not, but we cannot see this network. The under¬ 
lying network is assumed to be undirected and simple 


(i.e., it has no multi-edges or self-edges). The edge prob¬ 
abilities we observe are a noisy representation of the true 
network, but they nonetheless can contain information 
about structure—enough information, as we will see, to 
make possible the accurate detection of communities in 
many situations. 

Our approach to the detection problem takes the clas¬ 
sic form of a statistical inference algorithm. We propose 
a generative model for uncertain community-structured 
networks, then fit that model to our observed data. The 
parameters of the fit tell us about the community struc¬ 
ture. 

A. The model 

The model we use is an extension to the case of uncer¬ 
tain networks of the standard stochastic block model, a 
random graph model widely used for community struc¬ 
ture analyses EGS 115] . In the conventional definition 
of the stochastic block model, a number n of nodes are 
distributed at random among k groups, with a probabil- 
ity q r of being assigned to group r, where 2^ r=1 q r = 1 . 
Then undirected edges are placed independently at ran¬ 
dom between node pairs with probabilities io rs that de¬ 
pend only on the groups r, s that a pair belongs to and 
nothing else. If the diagonal elements co rr of the proba¬ 
bility matrix are significantly larger than the off-diagonal 
entries then one has traditional assortative community 
structure, with a higher density of connections within 
groups than between them. But one can also make the 
diagonal entries smaller to generate disassortative struc¬ 
ture or mixed structure types. 

Given the parameters and w rs , one can write down 
the probability, or likelihood, that we generate a partic¬ 
ular network in which node i is assigned to group gt and 
the placement of the edges is described by an adjacency 
matrix A with elements = 1 if there is an edge be¬ 
tween nodes i and j and 0 otherwise: 

P(A,g|7,u;) = P(g|7)P(A|g,u;) 

- II 7* II (1 - u gt9j y- A 'i. (1) 

i i<j 

Here 7 represents the vector of group probabilities and 
u> represents the matrix of probabilities tu rs . 

In extending the stochastic block model to uncertain 
networks we imagine a multi-step process, illustrated in 
Fig- [3 in which the network is first generated using the 
standard stochastic block model and then the definite 
edges and non-edges are replaced by probabilities, ef¬ 
fectively adding noise to the network data. The exact 
shape of the noise will depend on the detailed effects of 
the experimental procedure used to measure the network, 
which we assume to be unknown. We assume only that 
the edge likelihoods are true probabilities in a sense de¬ 
fined below (see Eq. Q). Remarkably, however, it still 
turns out to be possible to perform precise inference on 
the data. 
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Random network model Network instance 

Aij = 1 

Aij — 0 



Noise process 



FIG. 1: The model of uncertain network generation used in our calculations. A community assignment g and network A are 
drawn from a random network model such as the stochastic blockmodel. The experimental uncertainty is represented by giving 
each pair of nodes i,j a probability Qij of being connected by an edge, drawn from different distributions for edges A rj — 1 
and non-edges Aij = 0 . 


We represent the noise process by two unknown func¬ 
tions. The function j3\ (Q) represents the probability den¬ 
sity on the interval from 0 to 1 that a true edge between 
two nodes in the original (unobserved) network gives rise 
to a measured probability Q of connection between the 
same nodes in the observed (probabilistic) data. Con¬ 
versely, the function (3o(Q ) represents the probability 
density that a non-edge gives rise to probability Q. 

Given these two functions, we can write an expression 
for the probability (technically, probability density) that 
a true network represented by adjacency matrix A gives 
rise to a matrix of observed edge probabilities Q = {Qij} 
thus: 

P{ Q|A) = \[MQ,.,y [A ■ (2) 

i<j 

The crucial observation that makes our calculations 
possible is that the functions /3q and j3\ are not indepen¬ 
dent, because the numbers Qij that they generate are 
not just any edge weights but are specifically probabili¬ 
ties and are assumed to be independent. If we were to 
gather together all node pairs that have probability Q of 
being connected by an edge, the independence assump¬ 
tion implies that a fraction Q of them on average should 
in fact be connected by edges and the remainder should 
be non-edges. For example, 90% of all node pairs with 
Qij = 0.9 should, in expectation, be connected by edges. 

If there are to edges in total in our underlying true 
network, then there are m/3i{Q) d Q edges with ob¬ 
served probability lying between Q and Q + d Q and 
[( 2 ) — m ]Po(Q) d Q non-edges in the same interval. Hence 
for every possible value of Q we must have 

_ m/3 1 {Q) dQ _ = , , 

to/3i(Q) dQ + ((”) - m)/3 0 (Q) d Q 

Rearranging, we then find that 

Pi(Q) = Q/p m 

MQ) (1 - <3)/(l ~ P) ’ U 


where 


P = 



(5) 


is the so-called density of the network, the fraction of 
possible edges that are in fact present. Since we don’t 
know the true network, we don’t normally know the value 
of to, but it can be approximated by the expected number 
of edges Qij , which becomes an increasingly good 
estimate as the network gets larger, and from this figure 
we can calculate p. 

Note that Eq. Q implies that /3o(l) = 0 and /3i(0) = 0. 
The equation is also compatible with the choice /3q(Q) = 
HQ), Pi(Q) = HQ ~ 1)) where <5(rc) is the Dirac delta 
function, which corresponds to the conventional case of 
a perfectly certain network with Qij = Aij. 

Using Eq. @ we can now write Eq. © as 


P(Q|A) = I] -LJLfoiQij) 

• , -L tyij 

r<^ 1 J 


><n 


Qi,, \ ( 1 Qij 

p ) V 1 -p 


1 Aij 


( 6 ) 


The first product is a constant for any given set of ob¬ 
served probabilities Q and hence will have no effect on 
our maximum-likelihood calculations (which depend only 
on the position of the likelihood maximum and not on its 
absolute value). Henceforth, we will neglect this factor. 
Then we combine Eqs. (jT|) and ([ 6 ]) to get an expression 
for the likelihood of the data Q and the community as¬ 
signments g, neglecting constants and given the model 
parameters 7 and a;: 


-P(Q, g|7> <*>) = ^2 -P(Q! A)P(A, g|7, u) 


= IM E 

i i<j Aij =0,1 


Q'.i-^'i.n, 

Aij 

(1 Qij){ 1 ^9i9j) 

P 


1 ~ P 


1 -Ai 


n 7 *n —r 


Qij^gigj , (1 Qij)H w g>Sj) 


i<j <- 


1 -p 


(7) 


Our goal is now, given a particular set of observed 
data Q, to maximize this likelihood to find the best-fit 
parameters 7 and uj. In the process we will determine the 
community assignments g as well (which are frequently 
the primary objects of interest). 
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B. Fitting to empirical data 

Fitting the model to an observed but uncertain net¬ 
work, represented by the probabilities Qij , means deter¬ 
mining the values of the parameters 7 and u> that maxi¬ 
mize the probability of generating the particular data we 
see. In other words, we want to maximize the marginal 
likelihood of the data given the parameters: 

-P(Q|7,w) = £P(Q,g|7,<*>)- (8) 

g 

Equivalently, we can maximize the logarithm of this 
quantity, which gives the same result (since the logarithm 
is a monotone function) but is often easier. 

Direct maximization by differentiation gives rise to a 


set of implicit equations that have no simple solution, so 
instead we employ a standard trick from the statistics 
toolbox and apply Jensen’s inequality, which says that 
for any set of positive-definite quantities Xi, the log of 
their sum satisfies 

log£^E^log-, (9) 

i i qi 

where qi is any probability distribution over i satisfying 
the normalization condition ]T7 qi = 1 . One can easily 
verify that the exact equality is achieved by choosing 


i x i 


Applying Jensen’s inequality to ([ 8 ]), we get 

P(Q,g|7,w) 

, 9fgJ 
g 


log P(Q| 7 , w) > £ q( g) log 


9(g) 


= £ 9(g) £ log 7 ai + | £ 9(g) £ log 


Q ij 


ij^9i9j 


(1 Qij)( 1 w gi9j ) 


1 - P 


= £ £ 9r l0g7r + 5 £ £ 9% log 


ij rs 


QijUJrs (1 Q i j ) (1 ^rs ) 

P 1 ~P 


- £ 9 (g) log 9 (g) 

g 

-£ 9 (g) log 9 (g), ( 11 ) 


where q l r is the marginal probability within the probabil¬ 
ity distribution q( g) that node i belongs to community r: 

9r = £9(g)<W, (12) 

g 


and qj s is the joint marginal probability that nodes i 
and j belong to communities r and s respectively: 

9ri = £9(g)^,r^, SI (13) 

g 


with Sij being the Kronecker delta. 

Following Eq. (101, the exact equality in © , and 
hence the maximum of the right-hand side, is achieved 
when 


9(g) = 


P(Q,g|7,u;) 

Eg P (Q>g|7,^) 


II. 7 9i II,<J 

Qij^9i9j , (1 


p 1 1 -p 

Ili ^ 9i Ui<j 

Qij^9i9j . (1 U gi9j) 

p 1 1 -p 


(14) 


Thus, calculating the maximum of the left-hand side 
of ( |Tl] ) with respect to the parameters 7 , oj is equivalent 
to a double maximization of the right-hand side with re¬ 
spect to q( g) (by choosing the value above) so as to make 


the two sides equal, and then with respect to the parame¬ 
ters. At first sight, this seems to make the problem more 
complex, but numerically it is in fact easier—the double 
maximization can be achieved in a relatively straight¬ 
forward manner by alternately maximizing with respect 
to g(g) using Eq. (14) and then with respect to the pa¬ 
rameters. Such alternate maximizations can trivially be 
shown always to converge to a local maximum of the log- 
likelihood. They are not guaranteed to find the global 
maximum, however, so commonly we repeat the entire 
calculation several times from different starting points 
and choose among the results the one which gives the 
highest value of the likelihood. 

Once we have converged to the maximum, the final 
value of the probability distribution q( g) is given by 
Eq. (fl4|) to be 


9(g) = 


P(Q,g|7,^) 

P(Q|7,w) 


= P(g|Q,7,^)- 


(15) 


In other words, q( g) is the posterior distribution over 
community assignments g given the observed data Q 
and the model parameters. Thus, in addition to telling 
us the values of the parameters, our calculation tells us 
the probability of any assignment of nodes to communi¬ 
ties. Specifically, the one-node marginal probability q l r , 
Eq. (12), tells us the probability that node i belongs to 


community r and, armed with this information, we can 
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calculate the most probable community that each node 
belongs to, which is the primary goal of our calculation. 

We still need to perform the maximization of (11) over 
the parameters. We note first that the final sum is in¬ 
dependent of either 7 or uj and hence can be neglected. 
Maximization of the remaining terms with respect to 7 is 
straightforward. Differentiating with respect to 7 r , sub¬ 


ject to the normalization condition 7 r = 1 , gives 


= -£«*■ (16) 
n L ' 


Maximization with respect to w is a little more tricky. Only the second term in © depends on uj. but direct 
differentiation of this term yields a difficult equation, so instead we apply Jensen’s inequality 0 again, giving 


£ <&. log 


QijUJrs (1 Q zj) (1 k-Vs) 

P 1 ~P 


^££< 


trs lo § 


QijM rs 


pU 


+ (1 - O log 


(1 Qij)( 1 M rs ) 

(1 - p){ 1 - tli) 


(17) 


where t l r J s is any number between zero and one. 


r 


The exact equality, and hence the maximum of the right- 
hand side, is achieved when 


til = 


QijM rs f p 


QijM rs /p~\~ (1 Qij)(. 1 M rs )![\ p) 


(18) 


Thus, by the same argument as previously, we can maxi¬ 
mize the left-hand side of ( |17[ ) by repeatedly maximizing 
the right-hand side with respect to t l j s using Eq. (18) 


and with respect to M rs by differentiation. Performing 
the derivative and setting the result to zero, we find that 
the maximum with respect to uj rs falls at 


Hrs rs 

V 7/M 


(19) 


The optimal values of the M rs can now be calculated by 
iterating Eqs. (18) and (19) alternately to convergence 


from a suitable initial condition. 

The quantity t l r 3 s has a simple physical interpretation, 
as we can see by applying Eq. © to @, giving 


tll = 


Mrs Pi {Q ij ) 


MrsPl{Qij) T (1 M rs )Pl)( K Qi 


( 20 ) 


But by definition 


and hence 


til = 


U) r s 

= P{ A ij = 1| 9i 

= r. 

II 

( 21 ) 

Pi {Qij) 

= P{Qij\ A ij = 

i), 


( 22 ) 

PoiQij) 

= P{Qij\ A ij = 

0 ), 


(23) 

P(Aj 

= l|5i = r,gj = 


1 ) 


P(Qij\9i = T. 

.ft = 

= s) 


P(A l3 - 

= 1| Qij,gi = r,gj = 

s). 

(24) 


In other words, t l r 3 s is the posterior probability that there 
is an edge between nodes i and j. given that they are in 
groups r and s respectively. This quantity will be useful 


shortly when we consider the problem of reconstructing 
a network from uncertain observations. 

We now have a complete algorithm for fitting our 
model to the observed data. The steps of the algorithm 
are as follows: 

1. Make an initial guess (for instance at random) for 
the values of the parameters 7 and uj. 


2. Calculate the distribution q( g) from Eq. (14). 


3. Calculate the one- and two-node marginal proba¬ 
bilities ql and qp s from Eqs. (12) and (13). 


4. From these quantities calculate updated values of 7 
from Eq. (16) and m by iterating Eqs. (18) and (19) 


to convergence starting from the current estimate 
of uj. 


5. Repeat from step 2 
rameters converge. 


until q{ g) and the model pa- 


Algorithms of this type are known as expectation- 
maximization or EM algorithms HM3. The end result 
is a maximum likelihood estimate of the parameters 7 
and uj along with the posterior distribution over commu¬ 
nity assignments q( g) and the probability t'; 3 s of an edge 
between any pair of nodes. 

Equation (19) can usefully be simplified a little further, 
in two ways. First, note that Eq. © implies that t l3 s = 0 
whenever Q l;j = 0. All of the real-world data sets we 
have examined are sparse , meaning that a large majority 
of the probabilities Qu are zero. This means that most 


of the terms in the numerator of (19) vanish and can be 


dropped from the sum, which speeds up the calculation 
considerably. Indeed t l r 3 s need not be evaluated at all for 
node pairs i. j such that Qij = 0 , since this sum is the 
only place that t\ 3 a appears in our calculation. Moreover 
it turns out that we need not evaluate q l r 3 s for such node 
pairs either. The only other place that q] 3 s appears is in 
the denominator of Eq. (19), which can be simplified by 
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using Eq. © to rewrite it thus: 

£ q rs = £ ?(g) £ ‘W £ S 9j.s = ( n r n *) > ( 25 ) 

ij g i j 

where (...) indicates an average over q( g) and n r = 
Hi $gi,r is the number of nodes in group r, for community 
assignment g. For large networks the number of nodes in 
a group becomes tightly peaked about its mean value so 
that (n r n s ) ~ (n r ) (n a ) where (n r ) = H s 9(g) Hi <W = 
HiQr- Hence 


V t'l 

A^ij Hrs rs 

Hi Qr Hj '/ ' 


(26) 


This obviates the need to calculate q l j s for node pairs such 
that Qij = 0 (which is most node pairs), and in addition 
speeds the calculation further because the denominator 
can now be evaluated in time proportional to the number 
of nodes in the network, rather than the number of nodes 
squared, as in Eq. (191. (And the numerator can be eval¬ 
uated in time proportional to the number of nonzero Qij , 
which is small.) 


C. Belief propagation 


In principle, the methods of the previous section con¬ 
stitute a complete algorithm for fitting our model to ob¬ 
served network data. In practice, however, it is an im¬ 
practical one because it’s unreasonably slow. The bottle¬ 
neck is the sum in the denominator of Eq. (14), which is 


a sum over all possible assignments g of nodes to commu¬ 
nities. If there are n nodes and k communities then there 
are k n possible assignments, a number that grows with n 
so rapidly as to prohibit explicit numerical evaluation of 
the sum for all but the smallest of networks. 

This is not a new problem. It is common to most 
EM algorithms, not only for network applications but 
for statistics in general. The traditional way around it 
is to approximate the distribution q( g) by importance 
sampling using Markov chain Monte Carlo. In this paper, 
however, we use a different method, proposed recently 
by Decellc et al. Gang and specific to networks, namely 
belief propagation. 

Originally developed in physics and computer science 
for the probabilistic solution of problems on graphs and 
lattices nsaiaa, belief propagation is a message passing 
method in which the nodes of a network exchange mes¬ 
sages or “beliefs,” which are probabilities representing 
the current best estimate of the solution to the problem 
of interest. In the present case we define a message 
which is equal to the probability that node i belongs to 
community r if node j is removed from the network. The 
removal of a node is crucial, since it allows us to write 
a self-consistent set of equations satisfied by the mes¬ 
sages, whose solution gives us the distribution q( g) over 
group assignments. Although the equations can without 
difficulty be written exactly and in full, we will here ap¬ 
proximate them to leading order only in the small quan¬ 
tities uj rs . We find this approximation to give excellent 
results in our applications and the equations are consid¬ 
erably simpler, as well as giving a faster final algorithm. 


Within this approximation, the belief propagation equation for the message q l r ~^ J is: 


vr j = 7r 


z. 


'i->3 


6X p(-£ 

x k,s 


Qs Mrs 


n z 

H&) s 

Qik^o 


Qik^rs (I Qik'ji, 1 ^rs) 


1 - P 


where Z^j is a normalization coefficient that ensures Hr r i l H J = 1) having value 


■^->3 = £7r-exp( -^2 
k,s 


Qs Mrs 


n £ 

H¥=j) s 

Qik^o 


Qik^rs (I Qik'jQ M rs ) 


1 - P 


(27) 


(28) 


and ql is, as before, the one-node marginal probability of Eq. (12), which can itself be conveniently calculated directly 
from the messages via 


9r = ^ exp (-£9>r S ) n £ ? I 


3,s 


n 3~>i 


3 

Qij 


QijUJrs (1 Qij) M rs ) 


with 


Z i = £ 7r exp ( - ^ q&rs ) I| £ 17 




n , 


Qij^O 


1 ~ P 


Qij M rs ^ (1 Qij){ 1 M rs ) 


1 ~ P 


(29) 


(30) 


These equations are exact if the set of node pairs i,j with edge probabilities Qij > 0 forms a tree or is at 
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least locally tree-like (meaning that arbitrarily large local 
neighborhoods take the form of trees in the limit of large 
network size). For non-trees, which includes most real- 
world networks, they are only approximate, but previous 
results from a number of studies show the approximation 
to be a good one in practice [H HI 1201422] . 

Solution of these equations is by iteration. Typically 
we start from the current best estimate of the values of 
the beliefs and iterate to convergence, then from the con¬ 
verged values we calculate the crucial two-node marginal 
probability q]? s by noting that 


Qri = p (9i = r,gj = s\Qij) 

= P(9i = r i9j = s)P(Qij\gi = r,gj = s) 

Tn-s p (& = r, gj = s)P(Qij I g, = r, g 5 = s) 


• (31) 


where all data Q other than Qij are assumed given in 
each probability. The probabilities in these expressions 
are equal to 


P{gi = A gj = s) = rj^Ws^ 1 , 

P{Qij\gi = r,gj = s) = /?o(Qy)^ q— 


QijUJfs 

P 


(1 ft) i'j ) ( 1 ^rs) 

1 ~ P 


(33) 


in such cases the model can return poor performance on 
community detection tasks. 

The fix for this problem is straightforward. The degree- 
corrected stochastic block model is identical to the stan¬ 
dard block model except that the probability of an edge 
between nodes i,j that fall in groups r,s is didjL 0 rs (in¬ 
stead of just w rs ), where di is the observed degree of 
node i in the network. This modification allows the 
model to accurately fit arbitrary degree distributions, 
and community detection algorithms that perform fits to 
the degree-corrected model are found to return excellent 
results in real-world applications m- 

We can make the same modification to our methods 
as well. The developments follow exactly the same lines 
as for the ordinary (uncorrected) stochastic block model. 
The crucial equations (18) and (26) become 


t'J.= 


QijdidjiOrs/p 


QijdidjUj rs ]p T (1 f?o)( 1 dj i djUJ rs ')/(1 p) 


(32) and 


Cdrs — 


V n ij tV 

£-^ij ^rs rs 

d iQr Ej d j<A 


(35) 

(36) 


while the belief propagation equation (271 becomes 


Substituting these into (31), we get 


<W = 

-irs 


>lr w 

QijUJrs , (1 Qij)( 1 M rs ) 


p i -p 

2—>7 j—¥i 
ErsW J V i 

Qijui rs (l-Qij)(l-uvs) 

P 1-p 


(34) 


Our final algorithm then consists of alternately (a) it¬ 
erating the belief propagation equations (27) to conver¬ 


gence and using the results to calculate the marginal 
probabilities g* and q]? s from Eqs. (29) and (34), and 


(b) iterating Eqs. (18) and (26) to convergence to calcu¬ 


late new values of the w rs and using Eq. (16) to calculate 


new values of 7 r . In practice the algorithm is efficient— 
in other tests of belief propagation it has been found fast 
enough for applications to networks of a million nodes or 
more. 


vV = V 


J i->3 


exp ( -didj ^2 q\ 


UJ r 


n e 

H^j) s 

Qik^O 


QikdidjUJrs (1 ^i/c)(l d i d jUJ r s ) 


1 - P 


(37) 


with corresponding modifications to Eqs. (281 to (30) and 

Eq. d34h. 


In the following sections we describe a number of exam¬ 
ple applications of our methods . Am ong these, the tests 
on synthetic networks (Section |III A ) are performed us¬ 
ing the standard stochastic block model, without degree- 
correction, while the tests on real-world networks (Sec¬ 
tion III B) use the degree-corrected version. 


D. Degree-corrected model 

Our method gives a complete algorithm for fitting the 
standard stochastic block model to uncertain network 
data represented by the matrix Q of edge probabilities. 
As pointed out previously by Karrer and Newman m, 
however, the stochastic block model gives poor perfor¬ 
mance for community detection on many real-world net¬ 
works because the model assumes a Poisson degree distri¬ 
bution, which is strongly in conflict with the broad, fre¬ 
quently fat-tailed degree distributions seen in real-world 
networks. Because of this conflict it is often not possi¬ 
ble to find a good fit of the stochastic block model to 
observed network data, for any parameter values, and 


III. RESULTS 

We have tested the methods described in the previ¬ 
ous sections both on computer-generated benchmark net¬ 
works with known structure and on real-world examples. 

A. Synthetic networks 

Computer-generated or “synthetic” networks provide 
a controlled test of the performance of our algorithm. 
We generate networks with known community structure 
planted within them and then test whether the algorithm 
is able accurately to detect that structure. 










































For the tests reported here, we generate networks us¬ 
ing the standard (not degree-corrected) stochastic block 
model and then add noise to them to represent the net¬ 
work uncertainty, using functions /?o and /3i as defined in 
Section |H A| We use networks of size n = 4000 nodes, di¬ 
vided into two equally-size communities, and as the noise 
function j3\{Q) for the edges we use a beta distribution: 


/?! (Q) = 


Q ai ~ 1 (l-Q) bl 
B(oi, bi) 


bi-l 


(38) 


where B(a, b) is Euler’s beta function. As the noise func¬ 
tion Po{Q) for the non-edges we use a beta function plus 
an additional delta-function spike at zero: 


0o(Q) = c 


Q a ° ~\l-Q) b ° 
B(oq, bo) 


-1 


+ (1 — c)d(Q). (39) 


The delta function makes the matrix Q of edge proba¬ 
bilities realistically sparse, in keeping with the structure 
of real-world data sets, with a fraction 1 — c of non-edges 
having exactly zero probability in the observed data, on 
average. 

Thus there are a total of five parameters in our noise 
functions: do, b 0 , ai, &i, and c. Not all of these parame¬ 
ters are independent, however, because our functions still 
have to satisfy the constraint Q. Substituting Eqs. (38) 
and (391 into Q , we see that for the constraint to be sat¬ 
isfied for all Q > 0 we must have do = «i ~ 1, bo = bi + 1, 
and 


1 - p B(di,6i) 1 - p B(di,6i) 


P B(d 0 , bo) 

1 — p di — 1 


P B(di — 1, &i + 1) 


P 


bi 


(40) 


Thus there are really just two degrees of freedom in the 
choice of the noise functions. Once we fix the parameters 
di and bi, everything else is fixed also. Alternatively, we 
can fix the parameter c, thereby fixing the density of the 
data matrix Q, plus one or other of the parameters di 
and b\. 

The networks we generate are now analyzed using the 
non-degree-corrected algorithm of Sections |II A| to |II C| 
To quantify performance we assign each node i to the 
community r for which its probability q l r of membership, 
Eq. (Il2|), as computed by the algorithm, is greatest, then 


compare the result to the known true community assign¬ 
ments from which the network was generated. Success 
(or lack of it) is quantified by computing the fraction of 
nodes placed by the algorithm in the correct groups. We 
also compare the results against the naive (but common) 
thresholding method discussed in the introduction [2], 
in which edge probabilities Qij are turned into binary 
yes-or-no edges by cutting them off at some fixed thresh¬ 
old t, so that the adjacency matrix element Aij is 1 if 
and only if Qij > r. Community structure in the thresh- 
olded network is analyzed using the standard stochastic 
block model algorithm described in, for example, Refs. [BI 
and M. 


As we vary the parameters of the underlying net¬ 
work and noise functions the performance of both algo¬ 
rithms varies. When the community structure is strong 
and the noise is weak both algorithms (not surprisingly) 
do well, recovering the community structure nearly per¬ 
fectly, while for weak enough community structure or 
strong noise neither algorithm does better than chance. 
But, as shown in Fig. [2ji, there is a regime of interme¬ 
diate structure and noise in which our algorithm does 
significantly better than the naive technique. The fig¬ 
ure shows the fraction of correctly classified nodes in the 
naive algorithm as a function of the threshold r (data 
points in the figure) compared against the performance 
of the algorithm of this paper (dashed line) and, as we 
can see, the latter outperforms the former no matter what 
value of t is used. Note that the worst possible perfor¬ 
mance still classifies a half of the nodes correctly—even 
a random coin toss would get this many right—so this 
is the minimum value on the plot. For high threshold 
values r approaching one, the threshold method throws 
away essentially all edges, leaving itself no data to work 
with, and hence does little better than chance. Con¬ 
versely for low thresholds the threshold method treats 
any node pair with a nonzero connection probability Qij 
as having an edge, even when an edge is wildly unlikely, 
thereby introducing large amounts of noise into the cal¬ 
culation that again reduce performance to a level little 
better than chance. The optimal performance falls some¬ 
where between these two extremes, around r = 0.25 in 
this case, but even at this optimal point the thresholding 
method’s performance falls far short of the algorithm of 
this paper. 

Figure [2]:) shows a different test of the method. Again 
we use networks generated from a stochastic block model 
with two groups and calculate the fraction of correctly 
classified nodes. Now, however, we vary the amount of 
noise introduced into the network to test the algorithm’s 
ability to recover structure in data of varying quality. 
The parameters of the underlying network are held con¬ 
stant, as is the parameter c that controls the sparsity of 
the data matrix Q. This leaves only one degree of free¬ 
dom, which we take to be the parameter b\ of the noise 
process (see Eq. (38)). 

A network with little noise in the data is one in which 
true edges in the underlying network are represented by 
probabilities Qij close to 1, in other words by a noise dis¬ 
tribution Pi(Q) with most of its weight close to 1. Such 
distributions correspond to small values of the param¬ 
eter b\. Noisier data are those in which the values of 
the Qij are smaller, approaching the values for the non¬ 
edges, thereby making it difficult to distinguish between 
edges and non-edges. These networks are generated by 
larger values of b\. Figure [2jr shows the fraction of cor¬ 
rectly classified nodes as a function of fo, so the noise 
level is increasing, and the quality of the simulated data 
decreasing, from left to right in the figure. 

As we can see, the algorithm returns close to perfect 
results when b\ is small- meaning that the quality of the 
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FIG. 2: Tests of the method described in this paper on syn¬ 
thetic benchmark networks, (a) Fraction of nodes placed 
in the correct community for uncertain networks generated 
using a stochastic block model with n = 4000 nodes, two 
groups of equal size, edge probabilities uon = u>22 = 0.02, 
uj 12 = UJ 21 = 0.014, and noise parameters ai = 1.4 and 
bi = 2 (see Eq. (38)). The horizontal dashed line shows the 
performance of the algorithm described in this paper. The 
points show the performance of a naive algorithm in which 
the uncertain network is first converted to a binary network 
by thresholding the edge probabilities and the result then 
fed into a standard community detection algorithm. The re¬ 
sults for each algorithm are averaged over 20 repetitions of 
the experiment with different networks. Statistical errors are 
comparable in size to the data points, (b) Fraction of nodes 
classified into their correct communities for stochastic block 
model networks with varying amounts of noise in the data. 
The parameters are the same as for (a ) bu t with the spar¬ 
sity parameter c fixed at l/4n (see Eq. (391 and the ensuing 
discussion) and varying the parameter b\, which controls the 
level of noise in the data. 


data is high and the algorithm almost sees the true un¬ 
derlying structure of the network. Performance degrades 
as the noise level increases, although the algorithm con¬ 
tinues to do significantly better than chance even for high 
levels of noise, indicating that there is still useful infor¬ 
mation to be extracted even from rather poor data sets. 


B. Protein interaction network 

As a real-world example of our methods we have ap¬ 
plied them to protein-protein interaction networks from 
the STRING database [23] . This database contains pro¬ 
tein interaction information for 1133 species drawn from 
a large body of research literature covering a range of 
different techniques, including direct interaction exper¬ 
iments, genomic information, and cross-species compar¬ 
isons. The resulting networks are of exactly the form con¬ 
sidered in this paper. For each network there is assumed 
to be a true underlying network in which every pair of 


proteins either interacts or doesn’t, but, given the un¬ 
certainty in the data on which they are based, STRING 
provides only probabilistic estimates of the presence of 
each interaction. Thus the data we have for each species 
consists of a set of proteins—the nodes—plus a likelihood 
of interaction for each protein pair. A significant major¬ 
ity of protein pairs in each of the networks are recorded 
as having zero probability of interaction, so the network 
is sparse in the sense assumed by our analysis. 

We analyze the data using the degree-corrected version 
of our algorithm described in Section |IID[ which is ap¬ 
propriate because the networks in the STRING database, 
like most real-world networks, have broad degree distri¬ 
butions. 

Figure [3ja shows the communities found in a three-way 
split of the protein-protein interaction network of the 
bacterium Borrelia hermsii HS1. Node colors denote the 
strongest community affiliation for each node, as quanti¬ 
fied by the one-node marginal probability q l r , with node 
size being proportional to the probability a node is in 
its most likely community (so that larger nodes are more 
certain). In practice, most nodes belong wholly to just 
one community. 

For comparison, we also show in Fig. §> the communi¬ 
ties found in the same network by the naive thresholding 
algorithm discussed earlier in which a node pair i,j is 
considered connected by an edge if and only if the prob¬ 
ability Qij exceeds a certain threshold, which here is set 
at 0.25. By contrast with the synthetic networks of the 
previous section, we do not know the true underlying 
communities for this network and so cannot calculate the 
fraction of correctly classified nodes, but it is clear from 
the figures that the new technique gives significantly dif¬ 
ferent results from the thresholding method, particularly 
for the community that appears in the upper right of the 
figure. 

A closer examination of the data reveals a possible ex¬ 
planation. The communities at the left and bottom in 
both panels of Fig.[3]consist primarily of high-probability 
edges and are easily identified in the data, so it is perhaps 
not surprising that both algorithms identify these com¬ 
munities readily and are largely in agreement. However, 
the third community, in the upper right of the figure, 
consists largely of edges of relatively low probability and 
the thresholding method has more difficulty with this 
case because many edges fall below the threshold value 
and so are lost, which may explain why the thresholding 
method divides the nodes of this community among the 
three groups. 

To give a simple picture, imagine a community whose 
nodes are connected by very many internal edges, but all 
of those edges have low probability. Because there are so 
many of them, the total expected number of true inter¬ 
nal edges in the underlying network —the number of node 
pairs times the average probability of connection—could 
be quite high, high enough to create a cohesive network 
community. Our algorithm, which takes edge probabil¬ 
ities into account, will allow for this. The thresholding 
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(a) Method of this paper (b) Thresholding method 


FIG. 3: Communities found by (a) the algorithm described in this paper and (b) the thresholding algorithm, in a three-way 
split of the protein interaction network of the bacterium Borrelia hermsii HS1, taken from the STRING database. Nodes are 
laid out according to the communities in (a) and the layout is the same in both panels. 


algorithm on the other hand can fail because the edges 
all have low probability, below the threshold used by the 
algorithm, and hence are discarded. The result is that 
the thresholding algorithm sees no edges at all and hence 
no community. The fundamental problem is that thresh¬ 
olding is just too crude a tool to see subtle patterns in 
noisy data. 


IV. EDGE RECOVERY 

A secondary goal in our analysis of uncertain networks 
is to deduce the structure of the (unobserved) underly¬ 
ing network from the uncertain data. That is, given the 
matrix Q of edge probabilities, can we make an informed 
guess about the adjacency matrix A? We call this the 
edge recovery problem. It is related to, but distinct from, 
the well studied link prediction problem 1241 . in which 
one is given a binary network of edges and non-edges but 
some of the data may be erroneous and the problem is to 
guess which ones. In the problem we consider, by con¬ 
trast, the data given are assumed to be correct, but they 
are incomplete in the sense of being only the probabilities 
of the edges, rather the edges themselves. 

The simplest approach in the present case is simply to 
use the edge probabilities Qij themselves to predict the 
edges—those node pairs i,j with the highest probabili¬ 
ties are assumed most likely to be connected by edges. 
But if we know, or believe, that our network contains 
community structure, then we can do a better job. If we 


know where the communities in the network lie, at least 
approximately, then given two pairs of nodes with similar 
values of Qij, the pair that are in the same community 
should be more likely to be connected by an edge than 
the pair that are not (assuming “assortative” mixing in 
which edge probabilities are higher inside communities). 

It turns out that our EM algorithm gives us precisely 
the information we need to perform edge recovery. The 
(posterior) probability of having an edge between any 
pair of nodes i,j can be written as 

P(A tJ = 1 ) 

= P i A H = 1 \ 9i = r,9j = s)P{gi = r,gj = s ) 

rs 

= £*&& (41) 


where the data Q and the parameters y, u) are assumed 
given in each probability and we have made use of 
Eq. (24) and the definition of q]? s . Both t]? s and q l J s are 
calculated in the course of running the EM algorithm, 
so we already have these quantities available to us and 
calculating P(A,j = 1) is a small extra step. 

Figure [4] shows a test of the accuracy of our edge pre¬ 
dictions using synthetic test networks once again. In 
these tests we generate networks with community struc¬ 
ture using the standard stochastic block model, as previ¬ 
ously, then run the network through the EM algorithm 
and calculate the posterior edge probabilities of Eq. (41) 
above. We compare the results against competing pre¬ 
dictions based on the prior edge probabilities Qij alone. 
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FIG. 4: Receiver operating characteristic (ROC) curves for 
the edge recovery problem on a synthetic network generated 
using a two-group stochastic block model with n = 4000 
nodes, wn = u> 22 = 0.05, CJ 12 = W 21 = 0.001, and noise 
parameters b 1 = 4 and c = 1/4 n. The three curves show 
the performance of the algorithm of this paper, the naive 
algorithm based on the raw probabilities Qij alone, and a 
hypothetical “ideal” algorithm that knows the values of the 
parameters used to generate the model (so that one does not 
have to run the EM algorithm at all). The diagonal dashed 
line represents is curve generated by an algorithm that does 
no better than chance. 

The figure shows receiver operating characteristic 
(ROC) curves of the results. To construct an ROC curve, 
one asks how many edges we would get right, and how 
many wrong, if we were to simply predict that the frac¬ 
tion x of node pairs with the highest probabilities of con¬ 
nection are in fact connected by edges. The ROC curve 
is the plot of the fraction of such predictions that turn 
out right (true positives) against the fraction wrong (false 
positives) for values of x from zero to one. By definition 
the curve always lies on or above the 45-degree line and 
the higher the curve the better the results, since a higher 
curve implies more true positives and fewer false ones. 

Figure [4] shows the ROC curves both for our method 
and for the naive method based on the raw probabili¬ 
ties Qij alone and we can see that, for the particular net¬ 
works studied here, the additional information revealed 
by fitting the block model results in a substantial im¬ 
provement in our ability to identify the edges of the net¬ 
work correctly. One common way to summarize the in¬ 
formation contained in an ROC curve is to calculate the 
area under the curve, where an area of 0.5 corresponds 
to the poorest possible results—no better than a random 
guess—and an area of 1 corresponds to perfect edge re¬ 
covery. For the example shown in Fig. [4] the area under 
the curve for our algorithm is 0.89 while that for the naive 
algorithm is significantly lower at 0.80. 


Also shown in the figure is a third curve representing 
performance on the edge recovery task if we assume we 
know the exact parameters of the stochastic block model 
that were used to generate the network, i.e., that we don’t 
need to run the EM algorithm to learn the parameter val¬ 
ues. This is an unrealistic situation—we very rarely know 
such parameters in the real world— but it represents the 
best possible prediction we could hope to make under any 
circumstances. And, as the figure shows, this best pos¬ 
sible performance is in this case indistinguishable from 
the performance of our EM algorithm, indicating that 
the EM algorithm is performing the edge recovery task 
essentially optimally in this case. 


V. CONCLUSIONS 

In this paper we have described methods for the anal¬ 
ysis of networks represented by uncertain measurements 
of their edges. In particular we have described a method 
for performing the common task of community detec¬ 
tion on such networks by fitting a generative network 
model to the data using a combination of an expectation- 
maximization (EM) algorithm and belief propagation. 
We have also shown how the resulting fit can be used to 
reconstruct the true underlying network by making pre¬ 
dictions of which nodes are connected by edges. Using 
controlled tests on computer-generated benchmark net¬ 
works, we have shown that our methods give better re¬ 
sults than previously used techniques that rely on simple 
thresholding of probabilities to turn indefinite networks 
into definite ones. And we have given an example appli¬ 
cation of our methods to a bacterial protein interaction 
network taken from the STRING database. 

The methods described in this paper could be extended 
to the detection of other types of structure in networks. If 
one can define a generative model for a structure of inter¬ 
est then the developments of Section |TT] can be applied, 
simply replacing the likelihood P(A,g| 7 , a;) in Eq. ([T]) 
with the appropriate probability of generation. Genera¬ 
tive models have been recently proposed for hierarchical 
structure in networks [4], overlapping communities 125] . 
ranking or stratified structure }26| . and others. In prin¬ 
ciple, our methods could be extended to any of these 
structure types in uncertain networks. 
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